-
Notifications
You must be signed in to change notification settings - Fork 25
ci(perf): Track Firewood Performance via AvalancheGo Benchmarks #1493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implements workflow to trigger C-Chain reexecution benchmarks in AvalancheGo and track Firewood performance over time. Supports task-based and custom parameter modes. Results stored in GitHub Pages via github-action-benchmark.
|
Unblocked, test runs: |
| # Structure on benchmark-data branch (see track-performance.yml for how this is populated): | ||
| # bench/ - Official benchmark history (main branch only) | ||
| # dev/bench/{branch}/ - Feature branch benchmarks (experimental) | ||
| - name: Include benchmark data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build working: https://github.com/ava-labs/firewood/actions/runs/21334822723/job/61405063779
Deploy only works from main due to environment protection rules.
…nputs_match reducing cognitive load
| "**/tests/compile_*/**", | ||
| "justfile", | ||
| "scripts/run-just.sh", | ||
| "scripts/**", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added new benchmark script (bench-cchain-reexecution.sh) which
caused CI to fail with "Config does not cover the file". Shell
scripts aren't checked for license content (only .rs/.go/.h are),
but must be explicitly listed in the config. Exclude entire
scripts/ directory to avoid listing each script individually.
https://github.com/ava-labs/firewood/blob/main/.github/check-license-headers.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Early review, could use another pass
There's a lot of code here and I barely was able to complete the review in my maximum review time. Please consider breaking this up for more timely reviews, especially if the review is anything larger than this.
I'm also a little confused about how we track which firewood and avalanchego versions we ran the test against.
Let's say the performance loss was due to a change in avalanchego, how can we know that?
| GH_TOKEN: ${{ secrets.FIREWOOD_AVALANCHEGO_GITHUB_TOKEN }} | ||
| # Custom mode (ignored when test is specified) | ||
| CONFIG: ${{ inputs.config }} | ||
| START_BLOCK: ${{ inputs.start-block }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like if you set START_BLOCK you better also be setting CURRENT_STATE_DIR_SRC to let it know where to get the bootstrap database, is that correct?
If so, we should verify that either neither is provided or both are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CURRENT_STATE_DIR is optional, not required because you might want to start from Genesis.
| # CURRENT_STATE_DIR_SRC (optional) S3 state directory (empty = genesis run) |
Validation:
firewood/scripts/bench-cchain-reexecution.sh
Line 219 in cb0f6be
| [[ -z "${START_BLOCK:-}${END_BLOCK:-}${BLOCK_DIR_SRC:-}" ]] && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to start from genesis, then you either must set START_BLOCK to 0 (1?) or not set it. So, if START_BLOCK is not 0, then CURRENT_STATE_DIR_SRC must be set. Isn't that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Close, START_BLOCK should be 1 (not 0) for Genesis, and when CURRENT_STATE_DIR_SRC is empty it means starting from genesis (no pre-existing state to bootstrap from). So the valid combinations are:
- Genesis run:
START_BLOCK=1,CURRENT_STATE_DIR_SRCempty - Resume run:
START_BLOCK=N,CURRENT_STATE_DIR_SRCpoints to state at block N-1
Remove local developer tooling (justfile recipe, flake.nix, METRICS.md) to reduce PR scope. These will be submitted in a follow-up PR after the CI workflow changes are merged.
I've split into two PRs per your feedback:
On version tracking This is a conscious tradeoff to get data flowing now with minimal setup, then iterate to something more robust (S3 storage, richer metadata, export for analysis). Planning to revisit after ~2 weeks of data collection. For now, each run logs Firewood / Avalanchego refs in the GitHub Actions summary, so the info exists just not queryable. Added your concern to the tracking doc as something to think through for the next iteration. How does that sound? Summary example: https://github.com/ava-labs/firewood/actions/runs/21334552166 |
|
Executing: Triggered reexecution benchmark in AvalancheGo https://github.com/ava-labs/avalanchego/actions/runs/21408080471 |
scripts/bench-cchain-reexecution.sh
Outdated
| err() { | ||
| if [[ "${GITHUB_ACTIONS:-}" == "true" ]]; then | ||
| echo "::error::$1" | ||
| else | ||
| echo "error: $1" >&2 | ||
| fi | ||
| } | ||
|
|
||
| die() { err "$1"; exit 1; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we combine these two into one (i.e. just append the exit code to err and remove die)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping them separate as err logs without exiting (useful if we need to log multiple errors before bailing), die is the "fatal" version. Clearer intent at call sites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only place we call err in bench_cchain_reexecution.sh is when we pass in an unknown command and even then, we exit afterwards with code 1.
Clearer intent at call sites.
I think replacing uses of die with err and exit 1 would be clearer in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Done
| GH_TOKEN: ${{ secrets.FIREWOOD_AVALANCHEGO_GITHUB_TOKEN }} | ||
| # Custom mode (ignored when test is specified) | ||
| CONFIG: ${{ inputs.config }} | ||
| START_BLOCK: ${{ inputs.start-block }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to start from genesis, then you either must set START_BLOCK to 0 (1?) or not set it. So, if START_BLOCK is not 0, then CURRENT_STATE_DIR_SRC must be set. Isn't that correct?
…ith err and intentional exit code status
Why
Track C-Chain reexecution benchmark performance over time. Catch regressions before production.
Closes #1494
How
Firewood → triggers AvalancheGo benchmark → downloads results → publishes to GitHub Pages
Changes
scripts/bench-cchain-reexecution.sh.github/workflows/track-performance.ymlbench/, branches →dev/bench/{branch}/)Usage
FIREWOOD_REF=v0.0.18explicitly. Without it, the workflow builds fromHEAD, which currently fails due to changes in FFI layerRelated
AvalancheGo: ava-labs/avalanchego#4650
Local iteration PR: #1642
Followup: #1639